Financial Contributions to Presidential Campaigns (Ohio State)

Dataset: Financial Contributions to Presidential Campaigns (Ohio State) Time: 2016 The reason to choose this dataset: Ohio is known as a swing state which could forecast the election result by the status of Ohio state. ======================================================== #### General R library and data loading

Univariate Plots Section

##       cmte_id           cand_id                           cand_nm     
##  C00575795:71194   P00003392:71194   Clinton, Hillary Rodham  :71194  
##  C00577130:34686   P60007168:34686   Sanders, Bernard         :34686  
##  C00580100:24166   P80001571:24166   Trump, Donald J.         :24166  
##  C00574624:16406   P60006111:16406   Cruz, Rafael Edward 'Ted':16406  
##  C00573519: 7937   P60005915: 7937   Carson, Benjamin S.      : 7937  
##  C00581876: 4824   P60003670: 4824   Kasich, John R.          : 4824  
##  (Other)  : 5262   (Other)  : 5262   (Other)                  : 5262  
##                   contbr_nm          contbr_city     contbr_st  
##  STOWE, JANICE         :   277   COLUMBUS  : 17328   OH:164475  
##  MISSLER, ANDREW J. MR.:   203   CINCINNATI: 15630              
##  BRIONES, BERTA        :   179   CLEVELAND :  5778              
##  MOESER, DONALD        :   176   DAYTON    :  4634              
##  CUMMINGS, JOHN        :   142   TOLEDO    :  3287              
##  SCHEEL, PATRICK       :   133   AKRON     :  3206              
##  (Other)               :163365   (Other)   :114612              
##    contbr_zip                     contbr_employer 
##  Min.   :       10   RETIRED              :27097  
##  1st Qu.:431109498   N/A                  :22434  
##  Median :440942900   SELF-EMPLOYED        : 8353  
##  Mean   :368573923   NONE                 : 7638  
##  3rd Qu.:450131451   INFORMATION REQUESTED: 7611  
##  Max.   :458969665   (Other)              :91213  
##  NA's   :3           NA's                 :  129  
##              contbr_occupation contb_receipt_amt  contb_receipt_dt 
##  RETIRED              :43434   Min.   :-10800    11-JUL-16:  2211  
##  NOT EMPLOYED         :10378   1st Qu.:    16    06-JUL-16:  2204  
##  INFORMATION REQUESTED: 7549   Median :    28    12-JUL-16:  1952  
##  ATTORNEY             : 3320   Mean   :   120    29-FEB-16:  1534  
##  HOMEMAKER            : 3234   3rd Qu.:    80    12-AUG-16:  1529  
##  (Other)              :96538   Max.   : 29100    31-MAR-16:  1463  
##  NA's                 :   22                     (Other)  :153582  
##                      receipt_desc    memo_cd   
##                            :162495    :127925  
##  Refund                    :   887   X: 36550  
##  REDESIGNATION FROM PRIMARY:   211             
##  REDESIGNATION TO GENERAL  :   210             
##  REATTRIBUTION TO SPOUSE   :   114             
##  REATTRIBUTION FROM SPOUSE :   112             
##  (Other)                   :   446             
##                                memo_text       form_tp      
##                                     :114599   SA17A:128232  
##  * EARMARKED CONTRIBUTION: SEE BELOW: 33677   SA18 : 35356  
##  * HILLARY VICTORY FUND             : 14385   SB28A:   887  
##  EARMARKED FROM MAKE DC LISTEN      :   282                 
##  *BEST EFFORTS UPDATE               :   246                 
##  REDESIGNATION FROM PRIMARY         :   211                 
##  (Other)                            :  1075                 
##     file_num                       tran_id       election_tp   
##  Min.   :1003942   A80E77D0E713E417AA88:     3        :   522  
##  1st Qu.:1077664   C11887628           :     3   G2016: 56271  
##  Median :1096260   C10225661           :     2   P2016:107682  
##  Mean   :1095976   C10228611           :     2                 
##  3rd Qu.:1119042   C10230213           :     2                 
##  Max.   :1134173   C10234145           :     2                 
##                    (Other)             :164461                 
##     party          
##  Length:164475     
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
## [1] 164475
## [1] 24
## [1] 6555
## [1] 1341

## # A tibble: 3 × 5
##   party count total_amount      pos    percent
##   <chr> <int>        <dbl>    <dbl>      <dbl>
## 1     R 58091     11510839  5755420 0.58322342
## 2     D 71241      6744478 14883079 0.34172466
## 3 Other 35143      1481269 18995952 0.07505192

##  [1] COLUMBUS   COLUMBUS   CINCINNATI COLUMBUS   COLUMBUS   COLUMBUS  
##  [7] CINCINNATI AKRON      DAYTON     COLUMBUS   COLUMBUS   COLUMBUS  
## [13] TOLEDO     COLUMBUS   COLUMBUS  
## 1341 Levels:  BATAVIA 45320 ABERDEEN ADA ADAMS COUNTY ADDYSTON ... ZOAR
##  [1] COLUMBUS   COLUMBUS   CINCINNATI COLUMBUS   COLUMBUS   COLUMBUS  
##  [7] CINCINNATI AKRON      DAYTON     COLUMBUS   COLUMBUS   COLUMBUS  
## [13] TOLEDO     COLUMBUS   COLUMBUS  
## 10 Levels: COLUMBUS CINCINNATI CLEVELAND DAYTON TOLEDO ... LAKEWOOD

## # A tibble: 1,341 × 2
##       contbr_city total_amount
##            <fctr>        <dbl>
## 1      CINCINNATI    2605688.7
## 2        COLUMBUS    2226563.1
## 3       CLEVELAND     866239.9
## 4   CHAGRIN FALLS     383091.9
## 5          DUBLIN     379636.9
## 6  SHAKER HEIGHTS     376150.9
## 7           AKRON     358729.6
## 8          DAYTON     353846.1
## 9          CANTON     277801.2
## 10    WESTERVILLE     254291.7
## # ... with 1,331 more rows
## # A tibble: 10 × 2
##       contbr_city total_amount
##            <fctr>        <dbl>
## 1      CINCINNATI    2605688.7
## 2        COLUMBUS    2226563.1
## 3       CLEVELAND     866239.9
## 4   CHAGRIN FALLS     383091.9
## 5          DUBLIN     379636.9
## 6  SHAKER HEIGHTS     376150.9
## 7           AKRON     358729.6
## 8          DAYTON     353846.1
## 9          CANTON     277801.2
## 10    WESTERVILLE     254291.7

Univariate Analysis

What is the structure of your dataset?

There are 164,475 obs in the Ohio dataset with 18 original varibles. For analysis purpose, I added 3 extra varibles (party, Month_Yr and weekday)

What is/are the main feature(s) of interest in your dataset?

The main features in the data set are “contb_receipt_amt” and the factors influencing the amounts. I’d like to find out which features have the most impact on raising more contributed amounts and I’d like to provide a few suggestions for candidates in the future when running a election found-raising campaign. I suspect city, occupation and day of week matter.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Since 2016 American presidential election result has came out, it would be great to do comparison analysis between contributed amount data and the final voting result data. I downloaded the voting result data for analyzing the correlation between contributed amount and the voters in Ohio. (The analysis is covered in the next section.)

Did you create any new variables from existing variables in the dataset?

Yes, I create 3 variables for further analysis. The 3 variables are listed below. 1) Party: I categorized data into 3 categories(D, R, Other) based on candidate name 2) Month_Yr: showing the contributed amount trend by month 3) weekday: analyzing if there is a huge difference between weekday and wweekend.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I enriched the Ohio dataset with Zipcode to visualize the contributed amount on Ohio map.(The analysis is conducted in multivariate plots section.)

After merging with Ohio zipcode data from Zipcode library, I found there are 83 potential wrong zipcode data so I excluded them when I was plotting the contributed amount on the map. The reason why I excluded is that it is hard to identify the correct zipcode simply based on city names.

Bivariate Plots Section

## Source: local data frame [2,475 x 4]
## Groups: contbr_city [?]
## 
##     contbr_city party count total_amount
##           <chr> <chr> <int>        <dbl>
## 1       batavia     R     1       500.00
## 2         45320     R     1        80.00
## 3      aberdeen     D     5       900.00
## 4      aberdeen     R     2        44.00
## 5           ada     D    97      4272.00
## 6           ada Other    43      3682.88
## 7           ada     R    18      1458.00
## 8  adams county     R     1        80.00
## 9      addyston     D    11       392.55
## 10     addyston     R     3       190.00
## # ... with 2,465 more rows
##    contbr_city       D   Other    R
## 1      batavia    0.00    0.00  500
## 2        45320    0.00    0.00   80
## 3     aberdeen  900.00    0.00   44
## 4          ada 4272.00 3682.88 1458
## 5 adams county    0.00    0.00   80
## 6     addyston  392.55    0.00  190
##   contbr_city amount_D amount_Other amount_R votes_D votes_R total_votes
## 1       allen     0.00         0.00    80.00   12815   29858       44636
## 2     ashland  3927.04       657.00 20966.37    5659   17169       24074
## 3   ashtabula  4441.30      4928.40 14448.85   15191   22755       39809
## 4      athens 51808.95     18469.83 12721.48   15552   10816       27941
## 5     belmont  1759.50         0.00 39326.00    8652   20729       30537
## 6      butler   768.00      1028.60  1184.60   56700  104441      168422
##   total_amount
## 1        80.00
## 2     25550.41
## 3     23818.55
## 4     83000.26
## 5     41085.50
## 6      2981.20

## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

I noticed that the relationship between distributed amount and the number of voters is not positively strong. It seems to have week relationship which is against my original assumption.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

When dicussing the relationship between the contributed amount and the toal voters, Republican party supporters show stronger correlation than Democratic party supporters.

The correlation coefficient between contributed amount and voter numbers 1)Republican party : 0.401 2)Democratic party : 0.184

The correlation coefficient is higher than the correlation coefficient of total contributed amount and total voter numbers in Ohio (which is 0.307)

What was the strongest relationship you found?

The relationship between the total contributed amount and the contributed amount of Republican party is super relative (the correlation coefficient is 0.934) because the contributed amount from Republican party supporters accounts for ~60%.

However, this is not a proper pair to check the relationship because these 2 factors are not independent.

Multivariate Plots Section

##                     cand_nm contbr_city contbr_zip contb_receipt_amt party
## 1 Cruz, Rafael Edward 'Ted'    LEESBURG  451359416             25.00     R
## 2 Cruz, Rafael Edward 'Ted'     MINERVA  446579402             25.00     R
## 3   Clinton, Hillary Rodham    COLUMBUS  432141210             40.00     D
## 4          Sanders, Bernard    COLUMBUS  432022420             50.00 Other
## 5   Clinton, Hillary Rodham     LEBANON  450365038             57.31     D
## 6          Sanders, Bernard  CINCINNATI      45249              2.50 Other
## [1] 27392
## [1] 27309
## [1] 83
## Source : https://maps.googleapis.com/maps/api/staticmap?center=ohio+state&zoom=7&size=640x640&scale=2&maptype=roadmap&language=en-EN
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=ohio%20state

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

I noticed that the major cities account for more contributed amount. After visualing on the map, it shows clearly that there are a few of heat spots in Ohio.

Were there any interesting or surprising interactions between features?

After distinguishing the contributed amount by party, it shows that there are more funding going to Republican party and it refelects on voting result that Republican party won Ohio at the end.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

No. I tried to build a linear regression model between numeric and catergorical data but it failed and it seems to involve more complexing statistical library.


Final Plots and Summary

Plot One

Description One

The correlation coefficient between contributed amount and voter numbers 1)Republican party : 0.401 (plot 1-2) 2)Democratic party : 0.184 (plot 1-3)

The correlation coefficient is higher than the correlation coefficient of total contributed amount and total voter numbers in Ohio (which is 0.307, plot 1-1)

Plot Two

Description Two

Based on the analysis of contributed amount by weekday, it shows that there is lower contributed amount on weekend. This might cause by the reason that people tend to leave their weekend time for family. I would suggest to set some stops in the places where people love to go with their family during weekend. It might help to increase the funding rose on weekend.

Plot Three

Description Three

It shows that the contributed money is mainly from city area such as Columbus, Cleveland, Akron and Cincinnati etc. It helps candidates to identify the cities to plan their future campaigns for raising more funding.

I distinguish the funding for Republican party and Democratic party by color in Plot3-1. It shows that there are more funding for Republican party in Ohio and the voting result also shows that Republican party won Ohio state.


Reflection

Before starting the analysis, I assumed that the contributed amount would be a strong indicator for election result. After analyzing the relationship between the election result of Ohio and the contributed amount data of Ohio. The correlation coefficient between these 2 factors are lower than I expected and it can’t be suspected as having strong correlation between contributed amount and voter numbers.

However, this is only analyzing one state. I think, for optimizing/ further analayzing, I would suggest to analyze the data of all states in the U.S. to see if there are any strong relationship between these 2 factors.